Conversation
Signed-off-by: Kai Xu <kaix@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1011 +/- ##
==========================================
- Coverage 72.12% 70.11% -2.01%
==========================================
Files 209 221 +12
Lines 23628 25459 +1831
==========================================
+ Hits 17042 17851 +809
- Misses 6586 7608 +1022 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Added a separate ptq skill, needs further tuning. Claude opus can follow the skill, but sonnet needs more guide. |
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
18eb9c2 to
6968ad6
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
bd2d3da to
4f61bad
Compare
Copy nel-assistant skill as local evaluation skill so we can extend it to support optimized model evaluation requirements. Update modelopt orchestrator to reference the evaluation skill. Signed-off-by: Kai Xu <kaix@nvidia.com>
4f61bad to
28928a1
Compare
Add deployment skill (vLLM, SGLang, TRT-LLM serving) and update modelopt orchestrator to support three pipelines: - PTQ only - PTQ + Deploy (serve as API endpoint) - PTQ + Evaluate (accuracy benchmark) Signed-off-by: Kai Xu <kaix@nvidia.com>
3a320f6 to
5c46798
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Thanks. The skills are still at an early stage, so it’d be great to get more people using them and giving feedback. Testing across a broader set of models and optimization recipes will help us iterate quickly and make the workflows more robust. |
What does this PR do?
Type of change: ?
Adds a Claude Code skill suite for interactive model optimization with ModelOpt. The skill guides users through an end-to-end workflow: optimize model with modelopt APIs, deploy on vLLM and benchmark speed, evaluate accuracy with NeMo Evaluator (nel).
Usage
Invoke the skill in Claude Code:
/ptq
Say which model you want to quantize and in what quantization spec, e.g. nvfp4 mlp only
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information